The data files were merged and restructured:
Does this make sense? We could try to compare different types of policies by taking the metadata from our data collection.
Include table of how we determined the number of universities to sample per country.
Alternative variant of tile plot
Number of criteria per country
| country | n_unis | n_criteria | criteria_found | proportion_of_all_criteria |
|---|---|---|---|---|
| Austria | 6 | 17 | 29 | 28% |
| Brazil | 12 | 17 | 61 | 30% |
| Germany | 12 | 17 | 64 | 31% |
| United Kingdom | 24 | 17 | 109 | 27% |
| India | 12 | 17 | 33 | 16% |
| Portugal | 6 | 17 | 27 | 26% |
| United States | 35 | 17 | 106 | 18% |
Problem with the above figure: only 6 data points per Y-Val (code), therefore boxplot might obscure this. Maybe should show this. Maybe also just to vertical bars for each country.
The following figure depicts the same information as above but in a different way that is easier to read directly (if one wants to know the exact number of universities that mention a specific indicator).
The same information displayed along countries.
Do the same only for the US only.
## Joining, by = c("country", "university", "level", "status")
There is not much difference here.
## Joining, by = c("country", "university", "level", "status")
Conclusions:
Display significance levels (.05), although they are probably not meaningful given the non-random sample. P values were adjusted using the Benjamini, Hochberg, and Yekutieli methods to control the false discovery rate.
Now, do the correlation only for US
## Joining, by = c("country", "university", "level", "status")
## Warning in cor(., use = "pairwise.complete.obs"): the standard deviation is zero
Conclusions:
## Joining, by = c("country", "university", "level", "status")
## Warning in par(initial_par): graphical parameter "cin" cannot be set
## Warning in par(initial_par): graphical parameter "cra" cannot be set
## Warning in par(initial_par): graphical parameter "csi" cannot be set
## Warning in par(initial_par): graphical parameter "cxy" cannot be set
## Warning in par(initial_par): graphical parameter "din" cannot be set
## Warning in par(initial_par): graphical parameter "page" cannot be set
## Principal Components Analysis
## Call: psych::principal(r = x, nfactors = n, rotate = rotate)
## Standardized loadings (pattern matrix) based upon correlation matrix
## item RC2 RC1 RC3 RC4 h2 u2 com
## Gender of reviewers 8 0.93 0.88 0.12 1.0
## Gender equality 7 0.89 0.82 0.18 1.1
## Gender balance of reviewers 6 0.86 0.75 0.25 1.0
## Number of publications 10 0.37 0.14 0.86 1.1
## Engagement with policy makers 4 0.81 0.66 0.34 1.0
## Engagement with the public 5 0.74 0.55 0.45 1.0
## Engagement with industry 3 0.74 0.57 0.43 1.1
## Service to profession 14 -0.31 0.56 0.44 0.56 1.8
## Review & editorial activities 13 0.50 0.30 0.38 0.62 2.1
## Patents 11 0.75 0.27 0.64 0.36 1.3
## Software 15 0.24 0.70 0.56 0.44 1.3
## Publication quality 12 0.42 -0.48 0.34 0.54 0.46 2.9
## Citizen science 2 0.31 0.40 0.26 0.74 1.9
## Citations 1 0.80 0.68 0.32 1.1
## Journal metrics 9 0.31 0.69 0.59 0.41 1.5
##
## RC2 RC1 RC3 RC4
## SS loadings 2.69 2.67 1.72 1.39
## Proportion Var 0.18 0.18 0.11 0.09
## Cumulative Var 0.18 0.36 0.47 0.56
## Proportion Explained 0.32 0.31 0.20 0.16
## Cumulative Proportion 0.32 0.63 0.84 1.00
##
## Mean item complexity = 1.4
## Test of the hypothesis that 4 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.08
## with the empirical chi square 151.77 with prob < 6.1e-12
##
## Fit based upon off diagonal values = 0.85
Maybe doing a correspondence analysis could help? This could help visualising the initial figure (tile plot). However, one must be careful since the sample sizes are not equal among countries. Does that matter? Maybe to do a correspondence analysis of all vars vs all vars, to see how they interrelate (as an alternative to the PCA, which might be debatable given the binary data).
Variables to collate: Data, OA, Citizen Science, Software, Gender equality, three forms of engagement.
| country_name | mean | sd | se | upper | lower | Mean | Lower | Upper |
|---|---|---|---|---|---|---|---|---|
| Austria | 1.3333333 | 1.5055453 | 0.6146363 | 1.9479696 | 0.7186970 | 1.3333333 | 0.3333333 | 2.500000 |
| Brazil | 1.9166667 | 1.3789544 | 0.3980698 | 2.3147365 | 1.5185968 | 1.9166667 | 1.2500000 | 2.666667 |
| Germany | 1.2500000 | 1.2154311 | 0.3508647 | 1.6008647 | 0.8991353 | 1.2500000 | 0.5833333 | 1.916667 |
| India | 0.3333333 | 0.6513389 | 0.1880254 | 0.5213587 | 0.1453080 | 0.3333333 | 0.0000000 | 0.750000 |
| Portugal | 1.8333333 | 0.4082483 | 0.1666667 | 2.0000000 | 1.6666667 | 1.8333333 | 1.5000000 | 2.000000 |
| United Kingdom | 1.7916667 | 1.2846643 | 0.2622310 | 2.0538977 | 1.5294357 | 1.7916667 | 1.2916667 | 2.291667 |
| United States | 0.6857143 | 1.2071217 | 0.2040408 | 0.8897551 | 0.4816735 | 0.6857143 | 0.3142857 | 1.085714 |
## Joining, by = "country_name"